AITopics | video content

Collaborating Authors

video content

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Supplementary Material Infer Induced Sentiment of Comment Response to Video: A New Task, Dataset and Baseline Qi Jia 1 Baoyu Fan 2,1 Cong Xu1 Lu Liu

Neural Information Processing SystemsFeb-17-2026, 19:40:48 GMT

This section provides a comprehensive overview of the CSMV dataset. This extensive time range allows for the inclusion of a diverse set of content, capturing the evolution of sentiments over the course of more than two years. The distribution of labels in our CSMV dataset is shown in Figure 1. In Figure 1a, the opinion labels are distributed as follows: positive - 47%, neutral - 42%, and negative - 11%. Negative comments are clearly in the minority.

large language model, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Country: North America > United States (0.04)

Genre: Overview (0.68)

Industry: Law (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Communications > Social Media (0.96)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.47)

Add feedback

Infer Induced Sentiment of Comment Response to Video: A New Task, Dataset and Baseline Qi Jia 1 Baoyu Fan 2,1 Cong Xu1 Lu Liu

Neural Information Processing SystemsFeb-17-2026, 19:40:44 GMT

In light of this, we introduces a novel research task, M ulti-modal S entiment A nalysis for C omment R esponse of V ideo I nduced( MSA-CRVI), aims to infer opinions and emotions according to comments response to micro video.

artificial intelligence, machine learning, natural language, (22 more...)

Neural Information Processing Systems

Country:

North America > United States (0.14)
Europe > Italy > Tuscany > Florence (0.04)
Asia > China > Tianjin Province > Tianjin (0.04)

Genre: Research Report > New Finding (0.93)

Industry:

Health & Medicine > Therapeutic Area (0.46)
Information Technology > Services (0.46)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Vision (0.94)
(2 more...)

Add feedback

MMBench-Video: A Long-Form Multi-Shot Benchmark for Holistic Video Understanding Xinyu Fang

Neural Information Processing SystemsFeb-17-2026, 02:45:14 GMT

The advent of large vision-language models (L VLMs) has spurred research into their applications in multi-modal contexts, particularly in video understanding.

large language model, machine learning, natural language, (20 more...)

Neural Information Processing Systems

Country:

Asia > China > Shanghai > Shanghai (0.04)
Asia > China > Hong Kong (0.04)

Genre: Research Report > New Finding (0.67)

Industry:

Law (0.67)
Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Vision > Video Understanding (0.72)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.50)

Add feedback

Semantic Conditioned Dynamic Modulation for Temporal Sentence Grounding in Videos

Yitian Yuan, Lin Ma, Jingwen Wang, Wei Liu, Wenwu Zhu

Neural Information Processing SystemsFeb-12-2026, 11:07:56 GMT

Neural Information Processing Systems http://nips.cc/

feature map, video, video content, (16 more...)

Neural Information Processing Systems

Country:

Asia > China > Guangdong Province > Shenzhen (0.04)
North America > Canada (0.04)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Vision (0.95)

Add feedback

Multi-RAG: A Multimodal Retrieval-Augmented Generation System for Adaptive Video Understanding

Mao, Mingyang, Perez-Cabarcas, Mariela M., Kallakuri, Utteja, Waytowich, Nicholas R., Lin, Xiaomin, Mohsenin, Tinoosh

arXiv.org Artificial IntelligenceNov-13-2025

To effectively engage in human society, the ability to adapt, filter information, and make informed decisions in ever-changing situations is critical. As robots and intelligent agents become more integrated into human life, there is a growing opportunity-and need-to offload the cognitive burden on humans to these systems, particularly in dynamic, information-rich scenarios. To fill this critical need, we present Multi-RAG, a multimodal retrieval-augmented generation system designed to provide adaptive assistance to humans in information-intensive circumstances. Our system aims to improve situational understanding and reduce cognitive load by integrating and reasoning over multi-source information streams, including video, audio, and text. As an enabling step toward long-term human-robot partnerships, Multi-RAG explores how multimodal information understanding can serve as a foundation for adaptive robotic assistance in dynamic, human-centered situations. To evaluate its capability in a realistic human-assistance proxy task, we benchmarked Multi-RAG on the MMBench-Video dataset, a challenging multimodal video understanding benchmark. Our system achieves superior performance compared to existing open-source video large language models (Video-LLMs) and large vision-language models (LVLMs), while utilizing fewer resources and less input data. The results demonstrate Multi- RAG's potential as a practical and efficient foundation for future human-robot adaptive assistance systems in dynamic, real-world contexts.

information, large language model, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2505.2399

Country: North America > United States (0.46)

Genre: Research Report > New Finding (0.34)

Industry:

Government > Military (0.68)
Health & Medicine (0.47)
Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Laugh, Relate, Engage: Stylized Comment Generation for Short Videos

Ouyang, Xuan, Wang, Senan, Wang, Bouzhou, Xiahou, Siyuan, Zhou, Jinrong, Li, Yuekang

arXiv.org Artificial IntelligenceNov-7-2025

Short-video platforms have become a central medium in the modern Internet landscape, where efficient information delivery and strong interactivity are reshaping user engagement and cultural dissemination. Among the various forms of user interaction, comments play a vital role in fostering community participation and enabling content re-creation. However, generating comments that are both compliant with platform guidelines and capable of exhibiting stylistic diversity and contextual awareness remains a significant challenge. We introduce LOLGORITHM, a modular multi-agent system (MAS) designed for controllable short-video comment generation. The system integrates video segmentation, contextual and affective analysis, and style-aware prompt construction. It supports six distinct comment styles: puns (homophones), rhyming, meme application, sarcasm (irony), plain humor, and content extraction. Powered by a multimodal large language model (MLLM), LOLGORITHM directly processes video inputs and achieves fine-grained style control through explicit prompt markers and few-shot examples. To support development and evaluation, we construct a bilingual dataset using official APIs from Douyin (Chinese) and YouTube (English), covering five popular video genres: comedy skits, daily life jokes, funny animal clips, humorous commentary, and talk shows. Evaluation combines automated metrics originality, relevance, and style conformity with a large-scale human preference study involving 40 videos and 105 participants. Results show that LOLGORITHM significantly outperforms baseline models, achieving preference rates of over 90% on Douyin and 87.55% on YouTube. This work presents a scalable and culturally adaptive framework for stylized comment generation on short-video platforms, offering a promising path to enhance user engagement and creative interaction.

artificial intelligence, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2511.03757

Country: North America > United States > California (0.46)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Human Computer Interaction (1.00)
Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
(2 more...)

Add feedback

AVA: Towards Agentic Video Analytics with Vision Language Models

Yan, Yuxuan, Jiang, Shiqi, Cao, Ting, Yang, Yifan, Yang, Qianqian, Shu, Yuanchao, Yang, Yuqing, Qiu, Lili

arXiv.org Artificial IntelligenceNov-3-2025

AI-driven video analytics has become increasingly important across diverse domains. However, existing systems are often constrained to specific, predefined tasks, limiting their adaptability in open-ended analytical scenarios. The recent emergence of Vision Language Models (VLMs) as transformative technologies offers significant potential for enabling open-ended video understanding, reasoning, and analytics. Nevertheless, their limited context windows present challenges when processing ultra-long video content, which is prevalent in real-world applications. To address this, we introduce AVA, a VLM-powered system designed for open-ended, advanced video analytics. AVA incorporates two key innovations: (1) the near real-time construction of Event Knowledge Graphs (EKGs) for efficient indexing of long or continuous video streams, and (2) an agentic retrieval-generation mechanism that leverages EKGs to handle complex and diverse queries. Comprehensive evaluations on public benchmarks, LVBench and VideoMME-Long, demonstrate that AVA achieves state-of-the-art performance, attaining 62.3% and 64.1% accuracy, respectively-significantly surpassing existing VLM and video Retrieval-Augmented Generation (RAG) systems. Furthermore, to evaluate video analytics in ultra-long and open-world video scenarios, we introduce a new benchmark, AVA-100. This benchmark comprises 8 videos, each exceeding 10 hours in duration, along with 120 manually annotated, diverse, and complex question-answer pairs. On AVA-100, AVA achieves top-tier performance with an accuracy of 75.8%. The source code of AVA is available at https://github.com/I-ESC/Project-Ava. The AVA-100 benchmark can be accessed at https://huggingface.co/datasets/iesc/Ava-100.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2505.00254

Genre: Research Report (1.00)

Industry: Transportation > Ground > Road (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.89)

Add feedback

Supplementary Material Infer Induced Sentiment of Comment Response to Video: A New Task, Dataset and Baseline Qi Jia 1 Baoyu Fan 2,1 Cong Xu1 Lu Liu

Neural Information Processing SystemsOct-10-2025, 15:00:30 GMT

csmv dataset, dataset, video, (13 more...)

Neural Information Processing Systems

Country: North America > United States (0.04)

Genre: Overview (0.68)

Industry: Law (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Communications > Social Media (0.96)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.47)

Add feedback

Infer Induced Sentiment of Comment Response to Video: A New Task, Dataset and Baseline Qi Jia 1 Baoyu Fan 2,1 Cong Xu1 Lu Liu

Neural Information Processing SystemsOct-10-2025, 15:00:27 GMT

sentiment, sentiment analysis, video, (17 more...)

Neural Information Processing Systems

Country:

North America > United States (0.14)
Europe > Italy > Tuscany > Florence (0.04)
Asia > China > Tianjin Province > Tianjin (0.04)

Genre: Research Report > New Finding (0.93)

Industry:

Health & Medicine > Therapeutic Area (0.46)
Information Technology > Services (0.46)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Vision (0.94)
(2 more...)

Add feedback

MMBench-Video: A Long-Form Multi-Shot Benchmark for Holistic Video Understanding Xinyu Fang

Neural Information Processing SystemsOct-10-2025, 11:52:39 GMT

The advent of large vision-language models (L VLMs) has spurred research into their applications in multi-modal contexts, particularly in video understanding.

benchmark, dataset, mmbench-video, (14 more...)

Neural Information Processing Systems

Country: